Texas
VIBE: Annotation-Free Video-to-Text Information Bottleneck Evaluation for TL;DR
Many decision-making tasks, where both accuracy and efficiency matter, still require human supervision. For example, tasks like traffic officers reviewing hour-long dashcam footage or researchers screening conference videos can benefit from concise summaries that reduce cognitive load and save time. Yet current vision-language models (VLMs) often produce verbose, redundant outputs that hinder task performance. Existing video caption evaluation depends on costly human annotations and overlooks the summaries' utility in downstream tasks. We address these gaps with Video-to-text Information Bottleneck Evaluation (VIBE), an annotation-free method that scores VLM outputs using two metrics: grounding (how well the summary aligns with visual content) and utility (how informative it is for the task). VIBE selects from randomly sampled VLM outputs by ranking them according to the two scores to support effective human decision-making. Human studies on LearningPaper24, SUTD-TrafficQA, and LongVideoBench show that summaries selected by VIBE consistently improve performance--boosting task accuracy by up to 61.23% and reducing response time by 75.77% compared to naive VLM summaries or raw video. 2
Restricted Spectral Gap Decomposition for Simulated Tempering Targeting Mixture Distributions
Simulated tempering is a widely used strategy for sampling from multimodal distributions. In this paper, we consider simulated tempering combined with an arbitrary local Markov chain Monte Carlo sampler and present a new decomposition theorem that provides a lower bound on the restricted spectral gap of the algorithm for sampling from mixture distributions. By working with the restricted spectral gap, the applicability of our results is extended to broader settings such as when the usual spectral gap is difficult to bound or becomes degenerate. We demonstrate the application of our theoretical results by analyzing simulated tempering combined with random walk Metropolis-Hastings for sampling from mixtures of Gaussian distributions. Our complexity bound scales polynomially with the separation between modes, logarithmically with 1/ฮต, where ฮตdenotes the target accuracy in total variation distance, and exponentially with the dimension d.
Evaluating the Inductive Abilities of Large Language Models: Why Chain-of-Thought Reasoning Sometimes Hurts More Than Helps
Large Language Models (LLMs) have shown remarkable progress across domains, yet their ability to perform inductive reasoning--inferring latent rules from sparse examples--remains limited. It is often assumed that chain-of-thought (CoT) prompting, as used in Large Reasoning Models (LRMs), enhances such reasoning. We investigate this assumption with creating four controlled, diagnostic game-based tasks--chess, Texas Hold'em, dice games, and blackjack--with hidden humandefined rules. We find that CoT reasoning can degrade inductive performance, with LRMs often underperforming their non-reasoning counterparts. To explain this, we present a theoretical framework that reveals how reasoning steps can amplify error through three failure modes: incorrect sub-task decomposition, incorrect sub-task solving, and incorrect final answer summarization. Based on our theoretical and empirical analysis, we introduce structured interventions that adapt CoT generation according to our identified failure types. These interventions improve inductive accuracy without retraining. Our findings suggest that effective (CoT) reasoning depends not only on taking more steps but also on ensuring those steps are well-structured.
The Most Promising Ebola Vaccine Has Been Sitting on the Shelf for 15 Years
Years after initial tests, researchers are now racing to see if a vaccine developed in 2011 can help fight the current Bundibugyo outbreak in Congo. Fever was the first symptom to grip the crab-eating macaques in their high-containment laboratory on an island off Texas after being infected with the newly discovered Bundibugyo strain of ebola . Then came the weight loss, the rectal bleeding and nosebleeds, while scientists in space suits drew blood to see how the monkeys' immune systems struggled to fight the aggressive virus. But the three monkeys that had received a newly developed vaccine to protect against the understudied strain showed no symptoms of the disease, which eventually killed two-thirds of their unvaccinated companions. It was 2011, and virologist Thomas Geisbert's work developing the vaccine was done.
The Growing Political Power of Anti-Data Center Activists
Follow this section to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW? Smart Alerts: Get notified about major news as it happens. Follow this tag to personalize your feed and get instant alerts. Follow Go to your personalized feed WHY FOLLOW? Smart Alerts: Get notified about major news as it happens.
2025_NeurIPS_Final_Camera_Ready_Generating_Importance_Samples_for_Risk_Averse_Downstream_Tasks_final__Copy_
Risk-averse modeling is critical in safety-sensitive and high-stakes applications. Conditional Value-at-Risk (CVaR) quantifies such risk by measuring the expected loss in the tail of the loss distribution, and minimizing it provides a principled framework for training robust models. However, direct CVaR minimization remains challenging due to the difficulty of accurately estimating rare, high-loss events--particularly at extreme quantiles. In this work, we propose a novel training framework that synthesizes informative samples for CVaR optimization using score-based generative models. Specifically, we guide a diffusion-based generative model to sample from a reweighted distribution that emphasizes inputs likely to incur high loss under a pretrained reference model. These samples are then incorporated via a loss-weighted importance sampling scheme to reduce noise in stochastic optimization. We establish convergence guarantees and show that the synthesized, high-loss-emphasized dataset substantially contributes to the noise reduction. Empirically, we validate the effectiveness of our approach across multiple settings, including a real-world wireless channel compression task, where our method achieves significant improvements over standard risk minimization strategies.
Hot pavement can burn your dog's paws
Hot pavement can burn your dog's paws A new app is on a mission to save puppy paws from scorched concrete. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. Monitoring temperature is important for protecting your pooch's paws. Breakthroughs, discoveries, and DIY tips sent six days a week. By signing up, you confirm you are 16+, will receive newsletters and promotional content and agree to our Terms of Use and acknowledge the data practices in our Privacy Policy .